An Approach to Design Incremental Parallel Webcrawler

نویسندگان

  • DIVAKAR YADAV
  • AK SHARMA
  • SONIA SANCHEZ-CUADRADO
  • JORGE MORATO
چکیده

World Wide Web (WWW) is a huge repository of interlinked hypertext documents known as web pages. Users access these hypertext documents via Internet. Since its inception in 1990, WWW has become many folds in size, and now it contains more than 50 billion publicly accessible web documents distributed all over the world on thousands of web servers and still growing at exponential rate. It is very difficult to search information from such a huge collection of WWW as the web pages or documents are not organized as books on shelves in a library, nor are web pages completely catalogued at one central location. Search engine is basic information retrieval tool, used to access information from WWW. In response to the search query provided by users, Search engines use their database to search the relevant documents and produce the result after ranking on the basis of relevance. In fact, the Search engine builds its database, with the help of WebCrawlers. To maximize the download rate and to retrieve the whole or significant portion of the Web, search engines run multiple crawlers in parallel. Overlapping of downloaded web documents, quality, network bandwidth and refreshing of web documents are the major challenging problems faced by existing parallel WebCrawlers that are addressed in this work. A Multi Threaded (MT) server based novel architecture for incremental parallel web crawler has been designed that helps to reduce overlapping, quality and network bandwidth problems. Additionally, web page change detection methods have been developed to refresh the web document by detecting the structural, presentation and content level changes in web documents. These change detection methods help to detect whether version of a web page, existing at Search engine side has got changed from the one existing at Web server end or not. If it has got changed, the WebCrawler should replace the existing version at Search engine database side to keep its repository up-to-date

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AN OPTIMAL FUZZY SLIDING MODE CONTROLLER DESIGN BASED ON PARTICLE SWARM OPTIMIZATION AND USING SCALAR SIGN FUNCTION

This paper addresses the problems caused by an inappropriate selection of sliding surface parameters in fuzzy sliding mode controllers via an optimization approach. In particular, the proposed method employs the parallel distributed compensator scheme to design the state feedback based control law. The controller gains are determined in offline mode via a linear quadratic regular. The particle ...

متن کامل

An approach to fault detection and correction in design of systems using of Turbo ‎codes‎

We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...

متن کامل

Design of power-efficient adiabatic charging circuit in 0.18μm CMOS technology

In energy supply applications for low-power sensors, there are cases where energy should be transmitted from a low-power battery to an output stage load capacitor. This paper presents an adiabatic charging circuit with a parallel switches approach that connects to a low-power battery and charges the load capacitor using a buck converter which operates in continuous conduction mode (CCM). A gate...

متن کامل

Incremental parallel and distributed systems

Incremental computation strives for efficient successive runs of applications by reexecuting only those parts of the computation that are affected by a given input change instead of recomputing everything from scratch. To realize the benefits of incremental computation, researchers and practitioners are developing new systems where the application programmer can provide an efficient update mech...

متن کامل

Incremental compilation for parallel logic verification systems

Although simulation remains an important part of application-specific integrated circuit (ASIC) validation, hardware-assisted parallel verification is becoming a larger part of the overall ASIC verification flow. In this paper, we describe and analyze a set of incremental compilation steps that can be directly applied to a range of parallel logic verification hardware, including logic emulators...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012